Fast Semantic Relatedness: WordNet: : Similarity vs Roget's Thesaurus
نویسندگان
چکیده
A Measure of Semantic Relatedness (MSR) automatically determines how close two words are in meaning. MSRs are used in such Natural Language Processing (NLP) problems as word-sense disambiguation or text summarization. To solve such problems may require millions of relatedness scores, but MSR run-time, clearly a major concern, has rarely been considered in NLP research. To evaluate an MSR, one often assigns relatedness scores to word pairs and measures the correlation with human-assigned scores. The WordSimilarity-353 test collection [1] is a known evaluation set of 353 word pairs with given relatedness scores. Spearman’s correlation can be calculated, while Fisher’s r-z transformation can be used to measure significance. We evaluate run-time performance of eleven MSRs previously evaluated with respect to correlation [2]. Resources like WordNet and Roget’s Thesaurus have often been used to create MSRs. Ten MSRs are implemented in the WordNet::Similarity package [3], and one MSR uses the 1911 Roget’s Thesaurus [2]. Roget’s MSR calculates relatedness between two word-senses (appearances of a word in the Thesaurus) in constant time after finding them in an index. WordNet-based MSRs vary in complexity. A few could be implemented to run comparably fast to Roget’s MSR, but WordNet::Similarity seldom does it. Our comparison of two popular MSR packages aims to inform researchers and developers who work on time-sensitive NLP applications. BODY The fastest & best-correlated WordNet MSRs, respectively, take 82 & 182 times longer than Roget’s MSR, yet all are statistically equivalent.
منابع مشابه
Evaluation of Automatic Updates of Roget's Thesaurus
abstract Keywords: lexical resources, Roget's Thesaurus, WordNet, semantic relatedness, synonym selection, pseudo-word-sense disambiguation, analogy Thesauri and similarly organised resources attract increasing interest of Natural Language Processing researchers. Thesauri age fast, so there is a constant need to update their vocabulary. Since a manual update cycle takes considerable time, autom...
متن کاملMapping Roget's Thesaurus and WordNet to French
Roget’s Thesaurus and WordNet are very widely used lexical reference works. We describe an automatic mapping procedure that effectively produces French translations of the terms in these two resources. Our approach to the challenging task of disambiguation is based on structural statistics as well as measures of semantic relatedness that are utilized to learn a classification model for associat...
متن کاملA Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity
This paper presents the results of using Roget's International Thesaurus as the taxonomy in a semantic similarity measurement task. Four similarity metrics were taken from the literature and applied to Roget's. The experimental evaluation suggests that the traditional edge counting approach does surprisingly well (a correlation of r=0.88 with a benchmark set of human similarity judgements, with...
متن کاملA Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity
This paper presents the results of using Roget’s International Thesaurus as the taxonomy in a semantic similarity measurement task. Four similarity metrics were taken from the literature and applied to Roget’s. The experimental evaluation suggests that the traditional edge counting approach does surprisingly well (a correlation of r=0.88 with a benchmark set of human similarity judgements, with...
متن کاملExploring Noun-Modi er Semantic Relations
We explore the semantic similarity between base noun phrases in clusters determined by a comprehensive set of semantic relations. The attributes that characterize modiiers and nouns are extracted from WordNet and from Roget's Thesaurus. We use various machine learning tools to nd combinations of attributes that explain the similarities in each category. The experiments gave promising results, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- TinyToCS
دوره 1 شماره
صفحات -
تاریخ انتشار 2012